-
Notifications
You must be signed in to change notification settings - Fork 39
add advanced haplotype clustering #817
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
We now use the snp_effects() function... didnt know that existed, but makes sense |
I also add grey lines to the amino acid plot of the diplotype clustering functions - it looks a lot neater. And the haplotype function can now support multiple transcripts. |
This is super cool, @sanjaynagi. Thanks very much. I am struggling with bandwidth this week but have asked @leehart and @jonbrenas to take a look so we can merge asap. :) |
Awesome, thanks @ahernank |
Just adding a corresponding issue for this #819 |
Many thanks @sanjaynagi - This is great! Pylint is telling me there's a potential glitch around the following code in the # Get SNP genotype allele counts for the transcript, applying snp_query
df_eff = (
self.snp_effects(
transcript=transcript,
)
.query(snp_query)
.reset_index(drop=True)
) Raises problem:
I gather the issue is that class AnophelesSnpFrequencyAnalysis(AnophelesSnpData, AnophelesFrequencyAnalysis): Obviously all the tests pass, and the examples in the notebooks appear to work, so I don't quite understand why something like an We should probably check to see whether this warning from Pylint is right, anyhow. We might need to modify the inheritance chain, e.g. adding |
|
||
# Get SNP genotype allele counts for the transcript, applying snp_query | ||
df_eff = ( | ||
self.snp_effects( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pylint reports:
Instance of 'AnophelesHapClustAnalysis' has no 'snp_effects' member
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. also not clear why the code was working - but i have added the Frequency class to hapclust class now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks Lee
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sanjaynagi . The snp_effects
function is in AnophelesSnpFrequencyAnalysis
rather than AnophelesFrequencyAnalysis
, and AnophelesFrequencyAnalysis
only inherits from AnophelesBase
, so probably won't make snp_effects
available to AnophelesHapClustAnalysis
, if I understand correctly.
Also, since AnophelesSnpFrequencyAnalysis
inherits from AnophelesSnpData
, I suspect we might need to put it before AnophelesSnpData
in the AnophelesHapClustAnalysis
base classes list. That made we wonder whether we need AnophelesSnpData
there at all, but when I tried removing it locally, I got a method resolution error.
Unfortunately, I'm starting to suspect that there might be a larger architectural issue here, around inheritance, namely https://en.wikipedia.org/wiki/Multiple_inheritance#The_diamond_problem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, my bad. I will replace the import of AnophelesFrequencyAnalysis and see what happens
…alariagen/malariagen-data-python into advanced-haplotype-clustering-05-09-25
Also, here is a notebook with a demo of the function https://colab.research.google.com/drive/1HqDJL0cz9gAKaeXdwYjcGXx-X1GYJznL?usp=sharing |
Looks like we're still waiting to try using I believe class AnophelesHapClustAnalysis(AnophelesSnpFrequencyAnalysis, AnophelesHapData, AnophelesSnpData): We might be able to address the potential diamond inheritance problem in a separate issue. Otherwise, I reckon this looks good to go. |
This PR adds advanced haplotype clustering, allowing users to cut the tree and assign clusters, and to plot haplotype data (amino acid mutations by default) below the tree.
TODO